AITopics | video data

Collaborating Authors

video data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e6c9671ed3b3106b71cafda3ba225c1a-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 03:23:41 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.94)
Law (0.92)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)
(5 more...)

Add feedback

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

Neural Information Processing SystemsMar-17-2026, 23:11:37 GMT

Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises from the inherent complexity of videos and the inefficient language supervision in recent web-collected video-text datasets. In this paper, we introduce Text-Only Pre-Alignment (TOPA), a novel approach to extend large language models (LLMs) for video understanding, without the need for pre-training on real video data. Specifically, we first employ an advanced LLM to automatically generate Textual Videos comprising continuous textual frames, along with corresponding annotations to simulate real video-text data.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

e6c9671ed3b3106b71cafda3ba225c1a-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-17-2026, 17:11:16 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Finland > Pirkanmaa > Tampere (0.05)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.94)
Law (0.92)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)
(5 more...)

Add feedback

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Neural Information Processing SystemsFeb-10-2026, 19:51:18 GMT

Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models.

diffusion model, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos

Neural Information Processing SystemsDec-25-2025, 19:51:10 GMT

A fundamental goal of systems neuroscience is to understand the relationship between neural activity and behavior. Behavior has traditionally been characterized by low-dimensional, task-related variables such as movement speed or response times. More recently, there has been a growing interest in automated analysis of high-dimensional video data collected during experiments. Here we introduce a probabilistic framework for the analysis of behavioral video and neural activity. This framework provides tools for compression, segmentation, generation, and decoding of behavioral videos.

bayesian neural, behavioral video, name change, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Neural Information Processing SystemsOct-9-2025, 22:42:59 GMT

Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models.

diffusion model, omnitokenizer, tokenizer, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection

Sohan, Md Fahimuzzman, Alzubi, Raid, Alzoubi, Hadeel, Albalawi, Eid, Hafez, A. H. Abdul

arXiv.org Artificial IntelligenceSep-19-2025

Cattle lameness is a prevalent health problem in livestock farming, often resulting from hoof injuries or infections, and severely impacts animal welfare and productivity. Early and accurate detection is critical for minimizing economic losses and ensuring proper treatment. This study proposes a spatiotemporal deep learning framework for automated cattle lameness detection using publicly available video data. We curate and publicly release a balanced set of 50 online video clips featuring 42 individual cattle, recorded from multiple viewpoints in both indoor and outdoor environments. The videos were categorized into lame and non-lame classes based on visual gait characteristics and metadata descriptions. After applying data augmentation techniques to enhance generalization, two deep learning architectures were trained and evaluated: 3D Convolutional Neural Networks (3D CNN) and Convolutional Long-Short-Term Memory (ConvLSTM2D). The 3D CNN achieved a video-level classification accuracy of 90%, with a precision, recall, and F1 score of 90.9% each, outperforming the ConvLSTM2D model, which achieved 85% accuracy. Unlike conventional approaches that rely on multistage pipelines involving object detection and pose estimation, this study demonstrates the effectiveness of a direct end-to-end video classification approach. Compared with the best end-to-end prior method (C3D-ConvLSTM, 90.3%), our model achieves comparable accuracy while eliminating pose estimation pre-processing.The results indicate that deep learning models can successfully extract and learn spatio-temporal features from various video sources, enabling scalable and efficient cattle lameness detection in real-world farm settings.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2504.16404

Country: Asia (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UL-DD: A Multimodal Drowsiness Dataset Using Video, Biometric Signals, and Behavioral Data

Bodaghi, Morteza, Hosseini, Majid, Gottumukkala, Raju, Bhupatiraju, Ravi Teja, Ahmad, Iftikhar, Gabbouj, Moncef

arXiv.org Artificial IntelligenceJul-21-2025

In this study, we present a comprehensive public dataset for driver drowsiness detection, integrating multimodal signals of facial, behavioral, and biometric indicators. Our dataset includes 3D facial video using a depth camera, IR camera footage, posterior videos, and biometric signals such as heart rate, electrodermal activity, blood oxygen saturation, skin temperature, and accelerometer data. This data set provides grip sensor data from the steering wheel and telemetry data from the American truck simulator game to provide more information about drivers' behavior while they are alert and drowsy. Drowsiness levels were self-reported every four minutes using the Karolinska Sleepiness Scale (KSS). The simulation environment consists of three monitor setups, and the driving condition is completely like a car. Data were collected from 19 subjects (15 M, 4 F) in two conditions: when they were fully alert and when they exhibited signs of sleepiness. Unlike other datasets, our multimodal dataset has a continuous duration of 40 minutes for each data collection session per subject, contributing to a total length of 1,400 minutes, and we recorded gradual changes in the driver state rather than discrete alert/drowsy labels. This study aims to create a comprehensive multimodal dataset of driver drowsiness that captures a wider range of physiological, behavioral, and driving-related signals. The dataset will be available upon request to the corresponding author.

artificial intelligence, drowsiness, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.13403

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents

Hosain, Mehrab, Kapoor, Rajiv

arXiv.org Artificial IntelligenceJul-8-2025

This paper presents an approach for rail line detection and the identification of human beings in proximity to the track, utilizing the YOLOv5 deep learning model to mitigate potential accidents. The technique incorporates real-time video data to identify railway tracks with impressive accuracy and recognizes nearby moving objects within a one-meter range, specifically targeting the identification of humans. This system aims to enhance safety measures in railway environments by providing real-time alerts for any detected human presence close to the track. The integration of a functionality to identify objects at a longer distance further fortifies the preventative capabilities of the system. With a precise focus on real-time object detection, this method is poised to deliver significant contributions to the existing technologies in railway safety. The effectiveness of the proposed method is demonstrated through a comprehensive evaluation, yielding a remarkable improvement in accuracy over existing methods. These results underscore the potential of this approach to revolutionize safety measures in railway environments, providing a substantial contribution to accident prevention strategies.

artificial intelligence, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-97-2508-3_56

2507.0304

Genre: Research Report > New Finding (0.48)

Industry: